This document summaries the data work that has been done to correlate the occurrence of a dry spell with different climatological indicators. This work has been done for OCHA’s Anticipatory Action pilot in Malawi related to dry spells. It explores the relations with seasonal observed indicators, the skill of a seasonal and 15-day forecast to predict these dry spells, and the ability to observe dry spells in almost real-time. All work presented here has been done by the Centre for Humanitarian Data, but with indispensable help from technical partners. All the code is openly available here, and includes more details on the analyses presented in this document.
This work uses a list of historical dry spells and rainy seasons, that was also created as part of this project. Further information and analyses of the detection of these historical dry spells can be found here and this list of historical events can be downloaded on HDX.
Optimally the pre-alert of the trigger could be based on indicators with a long lead time, preferably 3 to 6 months. This would give the possibility of a wider range of anticipatory actions to be implemented by the organizations in country. Most information provided with such a long lead time doesn’t report directly the probability of dry spells in a certain location. They report a more general pattern, such as the ENSO state or the precipitation during a month. Therefore, we explored whether these more broad properties that are forecasted well-ahead of the rainy season, do have a correlation with the occurrence of a dry spell. This analysis is thus solely looking at observational data, not at forecasts. If the correlations between these observed data sources turn out to be significant, we move to forecasts. This with the assumption that if there is no such correlation in the observed data, there won’t be any in the forecasted data.
The body of literature on the relation between long-term meteorological indicators and the occurrence of dry spells in Malawi is rather limited. We are aware of two articles who investigated the correlation of dry spells in Malawi with ENSO, and 3-monthly precipitation, as well as temperature, wind speed, and wind direction. Mittal et al. (2021) investigated the correlation with total precipitation in a 3-month period and the occurrence of a prolonged dry spell in Malawi. This work defines a dry spell as 14 consecutive dry days, where a dry day is a day with <=2mm of precipitation. They found that there was only a weak correlation between the occurrence of a dry spell and the total 3-monthly precipitation, where again the strength of the correlation heavily depended on the location. Streefkerk (2020) looked at the correlation between 5-day long dry spells and the meteorological indicators of temperature, wind speed, wind direction and ENSO strength. It should be noted that 5-day long dry spells is a significantly different phenomenon than 14-day long dry spells, and thus results might not be transferable. She showed that the chosen indicators do have some predictive value for the occurrence of the 5-day dry spells, while these correlations heavily depend on the location. Moreover, the analysis suggests that the while the ENSO phenomenon has predictive value for overall drought, it is less decisive for the occurrence of dry spells, as these are more local events.
A commonly used long-term meteorological indicator is the El Niño Southern Oscillation (ENSO) state. This state is a global phenomenon, that causes seasonal climatological fluctuations and is related to Sea Surface Temperatures (SSTs). More background on the ENSO phenomenon can be found here. Our analysis explores the correlation with the observed ENSO state and the occurrence of a dry spell.
Several different indicators exist to measure the ENSO state, of which most are indicated here. The two indicators most commonly used are the NINO3.4 index and ONI. They use slightly different sources of data and aggregation methodologies, but show similar historical patterns. For our analysis it was chosen to use ONI, as their data is commonly used to keep track of the current ENSO state. Nevertheless, both indicators could be used here but no large differences in results are expected.
Malawi is at a transition zone, where the impact of the ENSO phenomenon is mainly observed in the Southern region. In this region El Niño generally marks dryer weather, while La Niña generally causes wetter weather. This is shown in the figure below, which is adapted from IRI by Ileen Streefkerk. The hypothesis would therefore be that the El Niño state increases the likelihood of a dry spell occurring.
ONI is reported as a running mean over a 3-month period. Open data exists from 1950 till present. Warm periods are defined as an ONI larger or equal to 0.5, while cold periods are defined as an ONI smaller or equal to -0.5. If 5 or more consecutive overlapping periods reach the threshold for a warm period, these seasons are defined as experiencing El Niño state. Similarly, 5 or more consecutive overlapping periods that reach the threshold for a cold period, are defined as experiencing La Niña state. All other seasons are defined as the neutral state.
For each 3 month period, an ENSO state was assigned to the ONI data based on the definition as described above. A 3-month period was marked as having experienced a dry spell if a dry spell started during the middle-month of that 3-month period. I.e. if a dry spell started on 15-03-2010, the FMA 2010 season was indicated to have experienced a dry spell. Since our main period of interest is the rainy season, this is the focus of this analysis. The rainy season was defined quite broadly, including all seasons from SON to MAM. Lastly, since the main effect of the ENSO phenomenon is expected in the Southern region, we only included the dry spells that occurred in this region (which is the largest fraction of all dry spells).
Below the ONI values per rainy season are shown, where a red bar indicates the occurrence of a dry spell. From here we can already see that it isn’t the case that all dry spells occur during an extreme ONI state, and not all extreme ONI states co-occurred with a dry spell.
We now classify the ONI values into the three states: El Niño, La Niña and Neutral. We then analyze which percentage of the occurrences of each state co-occurred with a dry spell (first graph), and the division of the occurrences of the dry spells across the states (second graph). This could be translated to false alarms and rate of detection respectively.
From these graphs we can conclude that:
Due to this division of dry spells across ENSO state, we conclude that the ENSO state is not a reliable indicator of the presence of dry spells. It is especially surprising that the Neutral state shows the most correspondence with dry spells, while from literature the Niño state is expected to show a higher correlation.
We also further analyzed if more extreme ONI values, instead of only the ENSO state, showed a higher correlation, but this was not the case. Moreover, we analyzed the results per 3month period, but not large differences were seen here either.
Due to not seeing a relationship between observed ENSO state and dry spells, we didn’t move on to analyze the forecasts with the assumption that this relationship will only be weaker. For those interested in the forecasts, the most commonly used forecasts is the one produced by IRI, of which the historical data can also be downloaded.
Another commonly used forecast source with a long lead time are the so-called seasonal forecasts. Here a season is defined as a 3 month period. Seasonal precipitation forecasts are an often used product for informing predictions related to seasonal drought, and depending on the source include forecasts for 1 to 6 months ahead.
Seasonal precipitation forecasts exist in different formats. The most common format is that of a tercile-based forecasts,where the three terciles are referred to as below-average, normal, and above-average precipitation. These tercile-based forecasts report the probability for the precipitation to be in each tercile, per raster cell. See here for a clear resource on the usage if tercile-based forecasts. The Malawi Met services (DCCMS) provide their forecast in the tercile format, as well as global organizations such as IRI and NMME.
While we do investigate the relationship of dry spells with seasonal below-average precipitation below, by definition the occurrence of below-average precipitation is not expected to have a strong relation with dry spells. This is the case for two reasons, namely
CHIRPS was used as data source to compute the monthly and 3-monthly total precipitation, and the occurrences of 3-monthly below-average precipitation. CHIRPS was chosen as data source, since this is the same source as was used to detect observed dry spells and thus thereby we eliminate any weakening of relationship due to biases in different sources. See for more information about CHIRPS the documentation on defining observed dry spells.
We work with the monthly sum of precipitations as directly provided by CHC on their FTP server. We then compute the 3-monthly sum from this per raster cell. Thereafter we compute for each cell whether it experienced below-average precipitation. The code and further explanation can be found [here]. Once we had the information whether a raster cell had below-average precipitation during a given season, we also aggregated this information to admin2 level.
For the aggregation to admin2 level, we used a percentage-based approach since whether a raster cell had below-average precipitation is a binary variable, and thus taking the mean would not be appropriate here. We therefore classified an admin2 having observed below-average precipitation if at least 50% of the raster cells received below-average precipitation
A 3-month period was assigned as having experienced a dry spell if a dry spell started during any of the 3 months in the given admin2. Thus this methodology is slightly different than the one used in the ENSO analysis, where a dry spell was only assigned to a season if it started during the middle month of that season.
Below the confusion matrix is shown, which indicates the co-occurrence of dry spells and below-average precipitation. As can be seen this co-occurrence is not great. Only 55% (52/94) of the seasons with a dry spell also had below-average precipitation. Moreover, in 89% of the 3-month periods with below-average precipitation, no dry spell occurred. Further analysis was done to determine if better correlations occur in certain admin2’s or during specific periods. This analysis didn’t show a significantly stronger signal.
Based on this confusion matrix and the other analyses, we conclude that the occurrence of below-average precipitation is not a good indicator for the likelihood of a dry spell occurring. We therefore didn’t move on to analyze the performance of seasonal forecasts instead of observations.
seasonal below average precipitation confusion matrix
Besides the long-term tercile forecasts, some organizations also provide absolute forecasted rainfall (in mm) with several months leadtime. This rainfall is forecasted either as an expected amount per day or per month. The main organizations providing this data are ECMWF and the UK Met office. From the daily projected amounts, one could in theory directly forecast the occurrence of a dry spell. However, since those forecasts have such a large uncertainty, it is very unlikely that the forecast will predict the occurrence of a dry spell by calendar day. Nevertheless, you can come up with other aggregated measures that might correlate with the occurrence of a dry spell. In this analysis we investigate the relationship of dry spells with total monthly precipitation.
CHIRPS was used as data source, for the same reasons it was used to compute the seasonal precipitation.
The CHIRPS data directly contained observed monthly rainfall. This was aggregated from raster cell to admin1 by taking the mean value of all cells within the admin1. The reason we aggregated to admin1 and not admin2 was because of the high spatial uncertainty in forecasts. This means that these forecasts are meant to indicate general regional patterns, but are not able to predict whether the rain will fall in the one city or the next. Therefore admin1 is a more proper spatial area to analyze if observed rainfall is a decent indicator for dry spells, as this is the information we need if we would move to forecasted data.
Moreover, we had to aggregate the dry spell data, which is at admin2, to admin1. We assigned an admin1 as experiencing a dry spell if at least 3 admin2’s experienced a dry spell during that time.
Lastly, we had to convert the daily dry spell data to monthly dry spell data. We assigned a month as experiencing a dry spell, if at least 7 days of that month were part of a dry spell.
Before looking at the correlations, it is important to note the methodological choices significantly decreased the occurrences of dry spells. This is caused by the fact that we only look at a part of the country, aggregated to admin1, and focus on 3 months in the year. This leaves us with 3 dry spells in December, 1 in January, and 3 in February from 2000 till 2020. This is very little, and therefore it is important to note that the results presented below are not statistically significant. However, it is the best we can do with the data we have.
Below the distribution of monthly precipitation, with and without dry spells is shown
From this figure we can see that
During December the distinction between months with and without dry spells is less clear, this is largely due to the fact that the rainy season often only starts in December. We therefore decided to only focus on January and February.
We investigated different thresholds of monthly precipitation and the ability to classify dry spells based on these thresholds. The figure below shows the misses and false alarms of dry spells for different thresholds per month. We can see that the distinction is pretty good, though we are only working with 4 events of dry spells here so the statistical significance is very low.
NOTE: false alarms is here FP/(FP+TN)! i.e. the percentage of times the monthly precipitation was below x mm, but there was no dry spell
If we would use monthly precipitation for a trigger, we would one threshold for both months. We therefore combine the months, as shown below.
Based on that figure, we computed the confusion matrix at a threshold of 170mm, since this is where the lines of misses and false alarms cross. We can see that all occurrences of dry spells had <=170 mm. However, 6 months without a dry spell also saw <=170 mm rainfall, which would mean a false alarm rate of 17% (6/36).
Due to this pretty good correlation, we decided to also analyze the skill of monthly forecasts and its ability to detect dry spells. This is explained in the next section.
Below a heatmap is shown, which shows the dates during which a dry spell and/or <=170 mm was observed.
As described in the previous section, the long-term indicators were shown to not have a strong correlation with the occurrence of a dry spell. We therefore move on to look at forecasts that can directly forecast dry spells, instead of predicting other indicators such as the ENSO state. Since forecasting skill generally decreases as the lead time decreases, the first forecast that was analyzed was that with the shortest lead time, namely 15 days ahead. This would mean that at the start of a dry spell, the alert could be triggered.
We are not aware of any work attempting to forecast prolonged dry spells in Malawi. Moving away from Malawi, Gbangou et al. (2020) researched the predictability of dry spells in Ghana. They analyze how well the forecasts can predict the number of dry spells within a season, where a dry spell is defined as at least 5 consecutive days with less than 1mm of rainfall per day. Moreover, they try to forecast the length of the longest dry spell. For the forecasting, ECMWF’s seasonal forecast as well as a statistical model based on Sea Surface Temperatures (SST) is used. They show that skill of these two forecasts depend on the lead time and location. However, the found correlations are generally weak. Interestingly, the forecasts do show to be better at predicting the extreme years, though the correlations are still not strong. Nevertheless, they argue that the forecasts have better skill than guessing based on climatologies and thus can be used to inform actions. Similar work has been done by Surmaini et al. (2021), but focusing on Indonesia and using NOAA’s CFSv2 seasonal forecast model. They showed a bit higher correlations, again also showing that these correlations heavily depend on location. Due to a completely different climate, these results are not directly transferable to Malawi.
A few organizations publish forecasts on monthly precipitation with several months leadtime. For this analysis it was chosen to use ECMWF’s seasonal forecast, as this is an often used and trusted source.
ECMWF releases a forecast each month. This forecast includes projected total precipitation per month for 1 to 6 months ahead. The 1 month ahead is the month the forecast was released, and the release date is always the 13th of the month. I.e. the 1 month leadtime only becomes available when we are already two weeks into that month.
ECMWF’s forecast is a probabilistic forecast, meaning it consists of several members (=models) each having their projected precipitation. This is what in this document is referred to as % of members, and can be interpreted as a probability of the event occurring.
For the computation of observed dry spells and monthly precipitation the same method was used as in the observed section. I.e. the numbers were aggregated to admin1 level. Again we only look at the Southern region and only the months of January and February.
ECMWF’s forecast is produced as a raster, with a low resolution of 1 degree. This raster is upsampled to have the same resolution as the CHIRPS data. Thereafter the mean is taken of all cells with their centre in the Southern region. This is done per ensemble member.
Two parameters have to be set to come to a trigger, namely the cap of forecasted monthly precipitation in mm, and the probability that the forecast predicts the precipitation will be below that cap. One of these two numbers have to be set first. We chose to first set the probability and thereafter determine the optimum cap. A probability of 50% was chosen as this is a clearly interpretable probability, and is relatively high.
NOTE: we also experimented with other probabilities, but this didn’t lead to better results.
The figure below shows the distribution of probabilities with and without a dry spell per leadtime. From this figure it can be seen that the distribution of the months with and without a dry spell is a lot less separable than we saw with observed data. Only the data with a leadtime of 1 month shows a high separability, but this month information is a lot less usable, since it is only released mid-way the month it is predicting for.
The figure below shows the miss and false alarm rate for different mm thresholds across all leadtimes. Based on this we set the threshold, at the point where the false alarm and miss rate intersect across all leadtimes. This is at 210.
NOTE: false alarms is FP/(FP+TP)! i.e. the percentage of times the monthly precipitation was below x mm, but there was no dry spell
With the threshold at 210, we can compute the confusion matrix per leadtime
Within the team it was decided that the most suitable leadtimes are 2 and 4 months. A leadtime of 2 months means that a forecast published mid December, forecasts for January. A leadtime of 4 months means that a forecast published mid October, forecasts for January.
For the 2 and 4 months leadtimes we compute the confusion matrix per month, to understand if there are large differences between the months. If we combine the numbers for January and February, we can see that with a leadtime of 4 months, 75% (3/4) of the dry spells are forecasted, but 86% (19/22) of the activations would be false alarms. With a leadtime of 2 months, 50% (2/4) of the dry spells would be forecasted, and 83% (11/13) of the activations would be false alarm. We can thus conclude that we detect most of the dry spells, but this comes at a very high false alarm rate.
To understand the occurrence of the threshold being met and dry spells across time, we can look at the heatmaps.
From this analysis we can conclude that the skill of the monthly forecast to detect dry spells is not great. With the current threshold of 210 we detect most of the dry spells, but this comes at a cost of a very high false alarm rate. Especially with a 4 months leadtime, this would lead to reaching the trigger 80% (16/20) of the years in February.
NOTE: it can also be chosen to lower the threshold, this will result in less false alarms but also a lower detection rate.
Several organizations produce forecasts with a 15-day lead time, but most of them are not openly available. For this analysis, we used CHIRPS-GEFS. This is a forecast produced by GEFS, and bias-corrected to the CHIRPS data. This forecast was chosen because it is openly available, has a long historical record, is well-acknowledged, and is bias-corrected to the same data that was used to determine observed dry spells. CHIRPS-GEFS is available as raster data at 0.05 resolution. A forecast is produced each day, and these forecasts are available from 2000 till present, with a data gap in 2020. Each forecast indicates the projected cumulative precipitation during the next 15 days per raster cell.
Due to the smaller leadtime, we assumed the spatial accuracy is high enough to analyze the skill of the forecast at admin2. The raster cell values of the CHIRPS-GEFS forecast were aggregated to admin2 by taking the mean value across all cells within the admin2.
We first analyzed the general performance of CHIRPS-GEFS, by computing a bias plot. This plot shows on the x-axis the observed precipitation over 15 days per admin2, retrieved from CHIRPS data, and on the y-axis the forecasted minus observed 15-day precipitation. If the forecast would be perfect, all values would form a horizontal line on the x-axis. However, from this plot we can see that CHIRPS-GEFS has the tendency to overpredict low amount of rainfall, while underpredicting high amounts of rainfall. Since we are interested in extremely low amounts of rainfall, this fact can be problematic for the forecasting skill of dry spells.
Besides the bias, the forecast might be good enough to detect dry spells. Sadly, this is not the case as can be seen from the figure. The forecasted dry spells very often don’t overlap with the observed dry spells. And the timing is not even close to those of the observed.
We define a dry spell as “detected”, if any part of the observed dry spell overlaps with any part of the forecasted dry spell. Thus, this is a very loose definition. First, a forecasted dry spell was defined as a forecast projecting less than 2mm of cumulative rainfall over 15 days. The confusion matrix for this is shown below, from which it can be seen that the performance is really bad. Only 10% (4/39) dry spells are detected. Moreover, there are many false alarms, namely 99, which leads to a false alarm rate of 96%.
Due to the tendency of CHIRPS-GEFS to overpredict, we also tested different thresholds of the forecasted rainfall as to when to classify it as a dry spell. Since the median of overprediction for 0-2mm of observed rainfall is around 25mm, it is expected that with a forecasted threshold of 25mm most dry spells will be detected. As can be seen in the confusion matrix this is indeed the case. However, this comes at a large drawback of many more false alarms (734).
From these results we conclude that CHIRPS-GEFS is not a suitable forecast to predict dry spells in Malawi.
The last analysis we did was a comparison of two sources of observed precipitation. Throughout this whole project we have used CHIRPS as source of observed precipitation. While showing generally accurate results, one disadvantage of CHIRPS is that there is a long timelag in publishing the results of about 1,5 months. Discussion have been held on using observational data as part of the trigger, but for this purpose CHIRPS is not suitable. We therefore also looked into the ability of ARC2 to detect dry spells. ARC2 only has a time lag of a few days. A rough comparison is shown in the heatmap below. From here we can see that the two sources correspond to a large extent. They don’t fully overlap, but the severe dry spell events were detected by both sources. We therefore argue that ARC2 would be a valid source to use for more real-time monitoring of dry spells.
Gbangou, Talardia, Fulco Ludwig, Erik van Slobbe, Wouter Greuell, and Gordana Kranjac-Berisavljevic. 2020. “Rainfall and Dry Spell Occurrence in Ghana: Trends and Seasonal Predictions with a Dynamical and a Statistical Model.” Theoretical and Applied Climatology 141 (1): 371–87.
Mittal, Neha, Edward Pope, Stephen Whitfield, James Bacon, Marta Bruno Soares, Andrew J Dougill, Marc van den Homberg, et al. 2021. “Co-Designing Indices for Tailored Seasonal Climate Forecasts in Malawi.” Frontiers in Climate 2: 30.
Streefkerk, Ileen. 2020. “Linking Drought Forecast Information to Smallholder Farmer’s Agricultural Strategies and Local Knowledge in Southern Malawi.”
Surmaini, E, E Susanti, MR Syahputra, FR Fajary, and others. 2021. “Use of the Dry-Spell Seasonal Forecast in Crop Management Decisions.” In IOP Conference Series: Earth and Environmental Science, 648:012092. 1. IOP Publishing.